semi-implicit variational inference
A Kernel Approach for Semi-implicit Variational Inference
Yu, Longlin, Cheng, Ziheng, Zhang, Shiyue, Zhang, Cheng
Semi-implicit variational inference (SIVI) enhances the expressiveness of variational families through hierarchical semi-implicit distributions, but the intractability of their densities makes standard ELBO-based optimization biased. Recent score-matching approaches to SIVI (SIVI-SM) address this issue via a minimax formulation, at the expense of an additional lower-level optimization problem. In this paper, we propose kernel semi-implicit variational inference (KSIVI), a principled and tractable alternative that eliminates the lower-level optimization by leveraging kernel methods. We show that when optimizing over a reproducing kernel Hilbert space, the lower-level problem admits an explicit solution, reducing the objective to the kernel Stein discrepancy (KSD). Exploiting the hierarchical structure of semi-implicit distributions, the resulting KSD objective can be efficiently optimized using stochastic gradient methods. We establish optimization guarantees via variance bounds on Monte Carlo gradient estimators and derive statistical generalization bounds of order $\tilde{\mathcal{O}}(1/\sqrt{n})$. We further introduce a multi-layer hierarchical extension that improves expressiveness while preserving tractability. Empirical results on synthetic and real-world Bayesian inference tasks demonstrate the effectiveness of KSIVI.
Importance Weighted Hierarchical Variational Inference
Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log-density in the case of hierarchical models (also known as latent variable models). We then derive a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors. We show that previously known methods, such as Hierarchical Variational Models, Semi-Implicit Variational Inference and Doubly Semi-Implicit Variational Inference can be seen as special cases of the proposed approach, and empirically demonstrate superior performance of the proposed method in a set of experiments.
Continuous Semi-Implicit Models
Yu, Longlin, Zha, Jiajun, Yang, Tong, Xie, Tianyu, Zhang, Xiangyu, Chan, S. -H. Gary, Zhang, Cheng
Semi-implicit distributions have shown great promise in variational inference and generative modeling. Hierarchical semi-implicit models, which stack multiple semi-implicit layers, enhance the expressiveness of semi-implicit distributions and can be used to accelerate diffusion models given pretrained score networks. However, their sequential training often suffers from slow convergence. In this paper, we introduce CoSIM, a continuous semi-implicit model that extends hierarchical semi-implicit models into a continuous framework. By incorporating a continuous transition kernel, CoSIM enables efficient, simulation-free training. Furthermore, we show that CoSIM achieves consistency with a carefully designed transition kernel, offering a novel approach for multistep distillation of generative models at the distributional level. Extensive experiments on image generation demonstrate that CoSIM performs on par or better than existing diffusion model acceleration methods, achieving superior performance on FD-DINOv2.
Semi-Implicit Variational Inference via Kernelized Path Gradient Descent
Pielok, Tobias, Bischl, Bernd, Rรผgamer, David
Semi-implicit variational inference (SIVI) is a powerful framework for approximating complex posterior distributions, but training with the Kullback-Leibler (KL) divergence can be challenging due to high variance and bias in high-dimensional settings. While current state-of-the-art semi-implicit variational inference methods, particularly Kernel Semi-Implicit Variational Inference (KSIVI), have been shown to work in high dimensions, training remains moderately expensive. In this work, we propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing. To further reduce the bias, we introduce an importance sampling correction. We provide a theoretical connection to the amortized version of the Stein variational gradient descent, which estimates the score gradient via Stein's identity, showing that both methods minimize the same objective, but our semi-implicit approach achieves lower gradient variance. In addition, our method's bias in function space is benign, leading to more stable and efficient optimization. Empirical results demonstrate that our method outperforms or matches state-of-the-art SIVI methods in both performance and training efficiency.
Importance Weighted Hierarchical Variational Inference
Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log-density in the case of hierarchical models (also known as latent variable models). We then derive a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors. We show that previously known methods, such as Hierarchical Variational Models, Semi-Implicit Variational Inference and Doubly Semi-Implicit Variational Inference can be seen as special cases of the proposed approach, and empirically demonstrate superior performance of the proposed method in a set of experiments.
Review for NeurIPS paper: Learning to Learn Variational Semantic Memory
Correctness: As mentioned above, I am a bit skeptical about the technical correctness for the variational inference framework. Specifically, - I think the latent z in Eq.(2) does not properly represent the class prototypes as z is conditioned on each individual x, not a entire class set (But on the other hand, Figure 1 shows that the latent z is conditioned on each of the class sets, and I'm confused which one is right). I don't understand how the approximate posterior q(z S) can have dependency on S, because according to the generative process defined by Eq.(2), the true posterior p(z x,y) does not have the dependency on the entire class set S except for each individual point (x,y). If it is not included, then the inference of m should be based on semi-implicit variational inference [2,3] as the intermediate stochastic variable m is only for the approximate posterior. However, such a discussion has not been discussed in the paper and the ELBO expression Eq.(13) seems not to represent the SIVI procedure as well.
Importance Weighted Hierarchical Variational Inference
Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log-density in the case of hierarchical models (also known as latent variable models). We then derive a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors. We show that previously known methods, such as Hierarchical Variational Models, Semi-Implicit Variational Inference and Doubly Semi-Implicit Variational Inference can be seen as special cases of the proposed approach, and empirically demonstrate superior performance of the proposed method in a set of experiments.
Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration
Yu, Longlin, Xie, Tianyu, Zhu, Yu, Yang, Tong, Zhang, Xiangyu, Zhang, Cheng
Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner. However, the single-layer architecture commonly used in current SIVI methods can be insufficient when the target posterior has complicated structures. In this paper, we propose hierarchical semi-implicit variational inference, called HSIVI, which generalizes SIVI to allow more expressive multi-layer construction of semi-implicit distributions. By introducing auxiliary distributions that interpolate between a simple base distribution and the target distribution, the conditional layers can be trained by progressively matching these auxiliary distributions one layer after another. Moreover, given pre-trained score networks, HSIVI can be used to accelerate the sampling process of diffusion models with the score matching objective. We show that HSIVI significantly enhances the expressiveness of SIVI on several Bayesian inference problems with complicated target distributions. When used for diffusion model acceleration, we show that HSIVI can produce high quality samples comparable to or better than the existing fast diffusion model based samplers with a small number of function evaluations on various datasets.